Logo Logo Cranfield
Analysing microbial communities from RestREco project

For analysis of functional profiles for bacteria, we firstly returned to Cranfield’s HPC cluster on Crescent2.

Here, we used QIIME2’s PICRUSt2 package to apply functional predictions to bacterial ASVs after exporting our agglomerated and rarefied feature table generated by phyloseq.

Outputs from PICRUSt included KEGG Orthology and MetaCyc pathways. QZA files generated in PICRUSt2 were imported into ggpicrust2 using qiime2r and converted to data frames.

KO abundance analysis (ggpicrust2)

We used the full ggpicrust2 workflow to analyse the KEGG Orthology outputs generated by PICRUSt2. The workflow (which can be found here) contains multiple steps with outputs including a barplot. However, if there are no significant differences, ggpicrust2 does not provide output

In our analysis, there were no KO pathways which were differentially abundant between control and drought samples. Additionally, no pathways were differentially abundant between sites’ drought samples.

Significant differences in KO pathways between sites were found only when comparing control samples.

Significant pathways from KO abundance analysis (ggpicrust2) - control samples

ALDEx2 was selected for differential abundance testing while FDR was selected as the p-value adjustment method.

Kruskal-Wallis and generalised linear model (GLM) were used by ALDEx2 as the non-parametric and parametric significance tests. Kruskal-Wallis was the more conservative of the tests, finding 3 KO pathways that were differentially abundant between sites control samples while GLM found 145.

As many KEGG annotations are irrelevant to microbial analysis, we chose not to analyse these for the initial report. However, the outputs are provided below.

Significance tables

Barplots

Provided are the barplots generated by ggpicrust2. The plots for GLM use data filtered from the larger dataset to show the top 20 and top 50 differentially abundant pathways between sites.

Kruskal-Wallis

GLM (top 20)

GLM (top 50)

PCA plot - KO abundance

PCA plots showing the clustering of samples can also be generated separately within ggpicrust2. The plot below shows most samples clustering by site, however the samples for Castle Field West and Jemma 9 indicate much within-site variation in the abundances of some pathways.

Heatmap - Kegg pathway expression in samples relative to mean abundances

We also generated heatmaps showing how the level of expression of the different pathways varies between, and within, sites. There appears to be a division between Castle Field West and Lardon Chase in the expression of different pathways in GLM with most of the displayed pathways showing reduced expression in Castle Field West and increased expression in Lardon Chase.

For Kruskal-Wallis, the split appears to be between Castle Field West and Harley Farms for most of the pathways displayed.

GLM

## The Sample Names in order from left to right are:
## P2C1, P2C2, P2C3, P19C1, P19C2, P19C3, P54C1, P54C2, P54C3, P55C1, P55C2, P55C3, P67C1, P67C2, P67C3, P20C1, P20C2, P20C3
## The Group Levels in order from left to right are:
## CW, CW, CW, HF, HF, HF, J6, J6, J6, J9, J9, J9, LC, LC, LC, WF, WF, WF

Kruskal-Wallis

## The Sample Names in order from left to right are:
## P2C1, P2C2, P2C3, P19C1, P19C2, P19C3, P54C1, P54C2, P54C3, P55C1, P55C2, P55C3, P67C1, P67C2, P67C3, P20C1, P20C2, P20C3
## The Group Levels in order from left to right are:
## CW, CW, CW, HF, HF, HF, J6, J6, J6, J9, J9, J9, LC, LC, LC, WF, WF, WF

Most abundant KEGG pathways in control samples

Lastly, we showed the most abundant KEGG pathways in our control samples. They were (from most abundant) (descriptions obtained from KEGG):

  • ko02010: ABC transporters
  • ko00230: Purine metabolism
  • ko03010: Ribosome
  • ko02020: Two-component system
  • ko00190: Oxidative phosphorylation
  • ko00240: Pyrimidine metabolism
  • ko00330: Arginine and proline metabolism
  • ko00620: Pyruvate metabolism
  • ko00720: Other carbon fixation pathways
  • ko00010: Glycolysis/gluconeogenesis

MetaCyc abundance analysis (ggpicrust2)

In our analysis, there were no MetaCyc pathways which were differentially abundant between control and drought samples. Additionally, no pathways were differentially abundant between sites’ control samples.

However, significant differences in MetaCyc pathways between sites were found when comparing drought samples.

Significant pathways from MetaCyc pathway abundance analysis (ggpicrust2) - drought samples

The tests and p-value adjustment methods used for KO pathways were also used for the MetaCyc pathways.

Twelve differentially abundant pathways were identified as significant between sites by the Kruskal-Wallis test and 264 by the generalised linear model (GLM).

Significance tables

Barplots

As the barplots generated by ggpicrust2 were used for the main analysis, these plots just show the top 3 differentially abundant MetaCyc pathways for each significance test. A table of all the significant pathways are above.

The top three most abundant pathways that were significantly different between sites for Kruskal-Wallis were peptidoglycan synthesis IV (p-adjust=0.041), purine ribonuclesides degradation (p-adjust=0.042) and methylphosphonate degradation (p-adjust=0.044).

Other interesting differentially abundant pathways which appear in the main tables were fatty acid oxidation and fatty acid salvage (both p-adjusted=0.048).

For GLM, the top pathways were aerobic respiration I (cytochrome c) (p-adjusted = 0.003) and L-isoleucine biosynthesis II (p-adjusted = 0.003).

Also differentially abundant were pathways associated with the TCA cycle, fatty acid beta oxidation, fatty acid salvage, and amino acid synthesis. Notably, GLM analysis reported all of the pathways identified by Kruskal-Wallis.

Many of these pathways were associated with the acquisition of energy or drought defence. This potentially indicates a drought response by bacterial communities. For example, fatty acid beta oxidation produces acetyl-CoA which is needed for the TCA cycle. Amino acid pathways also appeared in our results. Bacteria can use amino acids as osmolytes during periods of osmotic stress.

Kruskal-Wallis

GLM (top 3)

PCA plot - MetaCyc pathway

Looking at the PCA plot, we can see that replicas on three sites – Jemma 6, Jemma 9 and Harley Farms – clustered together while the remaining three sites showing greater functional diversity between replicas.

Heatmap - Metacyc pathway expression in samples relative to mean abundances

Heatmaps show the top five most significant pathways for both Kruskal-Wallis and GLM. For Kruskal-Wallis the activities of the purine ribonucleosides degradation, peptidoglycan biosynthesis IV and glycerol degradation to butanol pathways correlated to each other: increased in the samples for Lardon Chase and Harley Farms and decreased in the remaining sites.

For methylphosphonate degradation pathway, an even split between sites was observed with half the sites showing slightly above average expression of the pathway and the remainder slightly below average.

For GLM, the aerobic respiration pathway showed strong differences in abundance between samples for Lardon Chase and Castle Field West, being strongly expressed in Lardon Chase (Figure 18). Interestingly, the activities of the pathways associated with amino acid biosynthesis correlated to each other.

GLM

## The Sample Names in order from left to right are:
## P2S1, P2S2, P2S3, P19S1, P19S2, P19S3, P54S1, P54S2, P54S3, P55S1, P55S2, P55S3, P67S1, P67S2, P67S3, P20S1, P20S2, P20S3
## The Group Levels in order from left to right are:
## CW, CW, CW, HF, HF, HF, J6, J6, J6, J9, J9, J9, LC, LC, LC, WF, WF, WF

Kruskal-Wallis

## The Sample Names in order from left to right are:
## P2S1, P2S2, P2S3, P19S1, P19S2, P19S3, P54S1, P54S2, P54S3, P55S1, P55S2, P55S3, P67S1, P67S2, P67S3, P20S1, P20S2, P20S3
## The Group Levels in order from left to right are:
## CW, CW, CW, HF, HF, HF, J6, J6, J6, J9, J9, J9, LC, LC, LC, WF, WF, WF

Most abundant MetaCyc pathways in shelter samples

As with the KO pathways, we also plotted the most abundant MetaCyc abundance pathways. We obtained details on these from the MetaCyc database

  • PWY-3781: Aerobic respiration
  • PWY-7111: Pyruvate fermentation to isobutanol (engineered)
  • PWY-5101: L-isoleucine biosynthesis II
  • ILEUSYN-PWY: L-isoleucine biosynthesis I (from threonine)
  • VALSYN-PWY: L-valine biosynthesis
  • PWY-7094: Fatty acid salvage
  • BRANCHED-CHAIN-AA-SYN-PWY: Superpathway of branched chain amino acid biosynthesis
  • FAO-PWY: Fatty acid beta-oxidation
  • NONOXIPENT-PWY: Pentose phosphate pathway (non-oxidative branch) I
  • PWY-5103: L-isoleucine biosynthesis III

Functional analysis - fungi (FUNGuildR)

To assign functions to specific fungal taxa, we used FUNGuild classifications via FUNGuildR. The feature table from the phyloseq step was exported from R and a custom Python script was used to put the table in the correct format. The Python script (funguild_process.py) can be found in the GitHub repository under tools. The modified feature table was then imported into R.

After classification, any guilds classified as NA were changed to ‘unassigned’ while taxa with multiple assigned guilds were classified as ’non-specific.

Top 10 most abundant guilds

Firstly, we plotted the top 10 most abundant guilds according to treatment (control and drought).

FUNGuild classifications showed similar guilds were prevalent in the control and shelter samples. Undefined saprotrophs dominated in both control and shelter.

Dung saprotrophs, plant pathogens and wood saprotrophs were also present in modest numbers. For the shelter samples, insect pathogens were among the top 10 most abundant guilds, but not in the top 10 guilds for the controls.

Control

Shelter

Guilds grouped by site-treatment

When we plotted the relative abundances of the guilds present in each group (site-treatment), we saw site-dependent changes in relative abundances of dung saprotrophs between control and drought samples. We also noted that animal parasite abundances in drought samples increased on most sites.

Guilds grouped by treatment

No major shifts in the most abundant guilds were observed in relation to water stress.

Importantly around 33% of taxa had not been assigned a guild or function, while a further 19% of taxa had non-specific guilds (ie. multiple guilds identified during the processing stage). Poor taxonomic and functional classification has been identified in other studies. This highlights the limitations of current bioinformatics tools and resources for assessments of soil fungal communities.

The dominance of undefined and dung saprotrophs in our study is not surprising given they are among the major functional groups found in ecosystems.

The increased abundances of parasitic and pathogenic fungi was interesting to note. There is evidence in the literature that these guilds can increase in drought conditions so this may be occurring here.

Guild alpha diversity

As many of the changes observed were subtle (although still noticeable) we wanted to check if guild alpha diversity was significant using the Wilcoxon signed-rank test. As expected, it was not significant.

## 
##  Wilcoxon signed rank exact test
## 
## data:  split.func_richness[["Control"]][["Shannon"]] and split.func_richness[["Drought"]][["Shannon"]]
## V = 7, p-value = 1
## alternative hypothesis: true location shift is not equal to 0
## 
##  Wilcoxon signed rank exact test
## 
## data:  split.func_richness[["Control"]][["Simpson"]] and split.func_richness[["Drought"]][["Simpson"]]
## V = 8, p-value = 1
## alternative hypothesis: true location shift is not equal to 0